Gisele Fox, Emiri Nishizawa, Melina Perraut, Roshni Srikanth, Ha Nhat To
Movies have been an important aspect of society since their creation. They portray issues that reflect our culture and current events, while also providing entertainment. As a society, we have been witnessing the severe impact of the COVID-19 pandemic on the movie industry, from movie theater closures to film releases being delayed or pushed straight to streaming services. This project seeks to investigate the impact that global events - including the COVID-19 pandemic - have had on a variety of factors within the movie industry. The data from this project is mainly taken from the IMDb Movies Extensive Database, and supplemented with the IMDB 5000 Movie Dataset. Both of these datasets were populated with data from IMDb. The datasets were combined using common movie titles, and then filtered to only include their title, genre, gross income in the US, budget, language, country, average vote received, number of reviews from critics, and year published.
The midpoint_df.csv is a data set compiled from the IMDB 5000 Movie Dataset and the IMDB Movies Extensive Database for the purpose of calculation and data visualization in this deliverable. It contains data of 1637 movies produced in the US or in conjunction with the US from 1990 to 2020. The data set considers 12 variables, namely plot_keywords, genre, usa_gross_income, budget, language, country, avg_vote, reviews_from_critics, date_published, year, title_and_year.
About Table: A lot of our questions revolve around different statistics on movies across different years. Thus, our summary table looks at the number of movies published, average budget, USA gross income and most and least common genre from 1990 to 2020.
Insight: By taking our large data set and summarizing it, we are able to see more specific, cleaned up information. For example, this process revealed that our data pool only had one movie for 1992, which did not report a budget. The table displays trends throughout years as well. From 1993 to 2015, Drama was the most common genre of released movies. There is more variety in the least common genre across the years, however, History seems to come up often. Other than 1992 and 2018, the average budget of movies is in the ten millions. Again, 1992 had only one movie reported with no budget. 2018 has a budget that is significantly less than surrounding years. We can also see that 2020 had a significant decrease in average USA income. Considering the COVID-19 pandemic, global events may have an impact on the movie industry.
| Year | Number of Movies Published | Average Budget | Average USA Gross Income | Most Common Genre | Least Common Genre |
|---|---|---|---|---|---|
| 1990 | 6 | 14205000 | 55549289 | Comedy | Action |
| 1991 | 5 | 15694600 | 17693069 | Drama | Action |
| 1992 | 1 | NaN | 553171 | Action | Action |
| 1993 | 9 | 29187500 | 82194530 | Drama | Animation |
| 1994 | 18 | 35307692 | 81764922 | Drama | Animation |
| 1995 | 20 | 39078947 | 50213473 | Drama | Animation |
| 1996 | 30 | 38945385 | 53125653 | Drama | Family |
| 1997 | 37 | 44958065 | 46053871 | Drama | Family |
| 1998 | 50 | 28253488 | 36550476 | Drama | Fantasy |
| 1999 | 41 | 35045588 | 30532869 | Drama | Biography |
| 2000 | 68 | 32853279 | 29899064 | Drama | History |
| 2001 | 68 | 34096143 | 40879155 | Drama | Musical |
| 2002 | 86 | 33328267 | 41113047 | Drama | History |
| 2003 | 79 | 27669486 | 29988844 | Drama | Animation |
| 2004 | 65 | 38646542 | 47247506 | Drama | Musical |
| 2005 | 98 | 40062415 | 33537240 | Drama | War |
| 2006 | 96 | 34468125 | 25528268 | Drama | Music |
| 2007 | 85 | 37419787 | 42569300 | Drama | Musical |
| 2008 | 78 | 33990769 | 40956521 | Drama | History |
| 2009 | 81 | 40627029 | 46274694 | Drama | History |
| 2010 | 105 | 36840266 | 44933600 | Drama | History |
| 2011 | 102 | 33237211 | 30267374 | Drama | History |
| 2012 | 92 | 42853995 | 53299935 | Drama | History |
| 2013 | 75 | 55074094 | 52736999 | Drama | Sport |
| 2014 | 89 | 38249200 | 41272191 | Drama | War |
| 2015 | 72 | 44742656 | 70895405 | Drama | Fantasy |
| 2016 | 55 | 60447647 | 56733594 | Action | Family |
| 2017 | 7 | 32000000 | 12497767 | Drama | Action |
| 2018 | 5 | 7600000 | 95947786 | Crime | Family |
| 2019 | 8 | 47428571 | 58982142 | Action | Biography |
| 2020 | 6 | 19250000 | 9301084 | Horror | Adventure |
This bar chart depicts the average revenue (in the US) of the movie industry by year. Looking at the chart, we can see that there is a dip in average earnings from 1999 to 2011, but the largest drop is from 2018 to 2020. Even though there were more movies listed in the dataframe from 2020 as compared to 2018, there is still a massive drop in earnings, potentially from the coronavirus pandemic.
This line graph depicts the search results of 5 pandemic related movies relative to the highest point on the chart for their given time frame. I chose this chart because it clearly shows the popularity of these movies over time (from one year after their release date to November 2020) and helps give a sense of how quarantine brought these movies back into the spotlight. From this graph we an clearly see a boost in attention towards movies that deal with a pandemic in March 2020, even if they are more fiction than what could happen in reality (see I am Legend and 28 days later that are notably zombie movies). Every movie’s search results more than double in March of 2020 relative to November 2019 with Contagion coming in as the most different with a 9900% increase in that time frame. With respects to how this chart displays data it’s clear to see that the month that these movies have been searched the most, after the year following it’s release date, has been the month that the United States started implementing quarantine orders to get people to stay inside.